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Abstract. We consider the problem of random sampling for bandlimited func- 
tions. When can a bandlimited function / be recovered from randomly chosen 
samples f(xj),j € J C N? We estimate the probability that a sampling inequal- 
ity of the form 

41/11! <Ei/^)i 2 ^ll/H" 

je.J 

hold uniformly for all functions / E L 2 (R d ) with supp / C [— l/2,l/2] d or for 
some subset of bandlimited functions. 

In contrast to discrete models, the space of bandlimited functions is infinite- 
dimensional and its functions "live" on the unbounded set K d . These facts raise 
new problems and leads to both negative and positive results. 

(a) With probability one, the sampling inequality fails for any reasonable 
definition of a random set on R d , e.g., for spatial Poisson processes or uniform 
distribution over disjoint cubes. 

(b) With overwhelming probability, the sampling inequality holds for certain 
compact subsets of the space of bandlimited functions and for sufficiently large 
sampling size. 



1. Introduction 

The sampling problem asks for the reconstruction or approximation of a function 
/ from its sampled values {f(xj) : j G J} on some set X = {xj} C R d . In 
other words, one wants to recover / from given samples f{xj). This is a many- 
faceted problem and spreads over many areas of mathematics, engineering, and 
data processing. 

We will impose the standard hypothesis that / is bandlimited. In signal pro- 
cessing this is a realistic assumption, because it amounts to assuming a maximum 
frequency. The assumption is also relevant in complex analysis because a bandlim- 
ited function is just the restriction of an entire function of exponential growth from 
C d to R d . The space of bandlimited functions is defined to be 

B = {f E L 2 (R d ) : supp / C [-1/2, l/2] d } , 

where we have normalized the spectrum to be the unit cube and the Fourier trans- 
form is normalized as /(£) = L d f(x)e~ 2mx '^ dx. 
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The principal goal is to establish a sampling inequality of the form 

(1) A\\f\\l<J2\f(^)\ 2 <B\\f\\l V/6B. 

j 

A set {xj : j G J} C M. d satisfying the sampling inequality ([I]) is called a set of 
stable sampling or simply a set of sampling [19]. Once a sampling inequality is 
established, every / G £> is uniquely determined by its samples on X and depends 
continuously on these samples. 

Bandlimited functions in dimension d = 1 and d > 1 differ in a fundamental 
way because of the nature of their zeros. In dimension d = 1 the zeros of an entire 
function are always discrete, and there is a precise connection between the possible 
density of zeros and the growth of / [4,23,29]. By contrast, in higher dimensions, 
the zero sets are analytic manifolds, and standard complex variable techniques do 
no longer apply. As a consequence, almost everything is known about the sampling 
of bandlimited functions in dimension d — 1, but only a few results are known in 
higher dimensions, most notably a strong result of Beurling [3]. 

The difficulties of the sampling of multivariate functions have motivated us to 
turn to probabilistic techniques and to study random sampling. In this approach the 
sampling set X is a sequence of random variables Xj = Xj(u) on some probability 
space (Q, J 7 , P) and taking values in IR d . The sampling inequality (JTJ) defines an 
event on Q, and the goal is to estimate the probability that a random set is a set 
of sampling. 

This point of view has worked successfully in our previous work [1] where we have 
studied the random sampling of multivariate trigonometric polynomials. We were 
able to show that some popular numerical algorithms [28] work with "overwhelm- 
ing" probability. In a similar spirit, Candes, Romberg, and Tao [6,7] have recently 
investigated sparse trigonometric polynomials and their reconstruction from a few 
random samples. The more general context of mathematical learning theory has 
been studied by Cucker, Poggio, Smale, and Zhou [11,27,34]. In [34] sampling 
in general reproducing kernel Hilbert spaces was studied under the assumption 
of "rich data." This amounts to assuming the validity of a sampling inequality. 
By contrast, our interest is to establish the probability that this basic assumption 
holds. The common technical point in these approaches [1,6,11] is the estimate of 
entropy and covering numbers and a metric entropy argument. 

The first contributions to random sampling of bandlimited functions were per- 
turbation results in dimension d — 1. Seip and Ulanovsky [30] investigated random 
perturbations of regular sampling {j + Sj : j G Z}, where Sj is a sequence of i.i.d. 
random variables. Chistyakov, Lyubarskii, Pastur [9, 10] studied the more general 
problem of perturbation of arbitrary Riesz bases of exponentials. These contri- 
butions are based on the precise characterization of sampling sets in dimension 
d — 1 [29], and the proofs proceed by estimating the probability that a determin- 
istic condition is satisfied. 

For random sampling of bandlimited functions of several variables new types of 
problems arise. 
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(a) One cannot fall back on deterministic results in higher dimensions, because 
sampling theory is not nearly as developed as in dimension d — 1. In fact, this is 
the very reason why we aim for purely probabilistic results. 

(b) The space of bandlimited functions B is infinite-dimensional — in contrast 
to trigonometric polynomials of given degree or sparsity. Thus random matrix 
techniques as used in [15, 25] are not applicable. 

(c) The configuration space M. d is non-compact and unbounded — again in con- 
trast to trigonometric polynomials that "live" on the torus [0, l] d . This raises the 
question of how to model a sequence of random points in M d . On a compact set of 
positive (Lebesgue) measure the natural notion is that of an independent identically 
distributed (i.i.d.) sequence of points with uniform distribution. On M. d there are 
several natural choices. We will consider two such choices: uniform distributions 
on disjoint cubes and spatial Poisson processes. 

We will prove that for these two concepts of "randomly distributed points on 
]R d " the sampling inequality ([T]) must fail almost surely (Propositions I2T21 and 12. 3| ) . 
These results come as a surprise to the analyst, but are perhaps more natural for 
the probabilist. The reasons for the failure of a sampling inequality are either 
the zeros of entire functions or large holes in the sampling set. In the model of 
uniform distribution over disjoint cubes, many samples may be near the zeros of a 
bandlimited function with positive probability. In other words, the lower bound in 
([1]) is small. 

In the other model (spatial Poisson process) we show that, with positive proba- 
bility, there are large holes in the sampling set, which again implies a small lower 
bound in ([T]). 

To obtain insight into the formulation of positive results, we argue in a practical 
manner. Realistically one can sample / only on a bounded set; furthermore, every 
bandlimited function vanishes at infinity, thus samples far out do not contribute 
anything significant to a sampling inequality. We can learn about / G B only if 
the samples are taken in the "essential support" of /, i.e., the set where most of 
the L 2 -norm is localized. Thus we will study the subset 

B(R,5) = \feB: [ \f(x)\ 2 dx>(l-5)\\f\\l) 

of bandlimited functions. This subset is compact in B and thus somewhat resem- 
bles a finite-dimensional subspace. Since / G B(R, 5) is small outside the cube 
[— R/2, R/2] d , it should suffice to sample / on the relevant cube. In this way, 
we are back to a compact configuration space and an almost finite-dimensional 
function space. Our main result (Theorem 13.11) is a restricted sampling inequality 
for the subset B(R,5). The proof is a combination of analytic and probabilistic 
techniques. On the one hand, we will use detailed properties about the spectrum 
of time- limiting operators on bandlimited functions by Widom [35], on the other 
hand, the metric entropy method (see e.g., [12]). 

The paper is organized as follows. In Section 2 we discuss two natural models 
for random sequences in IR d and show that, with probability one, they fail to 
produce sets of stable sampling. In Section 3 we restrict the attention to a subset 
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of bandlimited functions and show that on this subset a sampling inequality holds 
with overwhelming probability. The proof of this result is contained in Section 4. 
In order to set up the metric entropy method, we discuss the spectrum of time- 
frequency limiting operators and covering numbers. We mention that we use two 
distinct inequalities of Bernstein, one from Fourier analysis bounding the L°° norm 
of the derivative, and the other from probability giving estimates for the sums of 
independent random variables. 

Acknowledgement. We would like to thank the anonymous referee for his 
useful comments and pointing out an embarrassing error in the first version. 

2. Negative Results 

In the case of multivariate trigonometric polynomials, we showed that if one 
chose points independently and uniformly distributed over the state space, then 
one could recover the trigonometric polynomial exactly provided only that one had 
at least as many sample points as the dimension [1, Thm 3.2]. 

We first show that this is far from the case for bandlimited functions. The 
difficulty is that the state space is not compact. 

We first recall a fundamental necessary condition of Landau for a set of sampling. 
Let 

/ x n - (y s r ■ cardan (y+[0,R] d ) 
2 D (X) = hm mm -, 

be the (lower) Beurling density of a set X C M. d . 

Proposition 2.1. Assume that X = {xj} is a set of stable sampling for B. Then 
X must have the following properties: 

(i) D~(X) > 1, in particular there is R > such that every cube of side length R 
contains a sampling point, i.e. X n (x + [0, R] d ) ^ for all iGlf 

(ii) the number of samples in any cube of length 1 is bounded, max ygR d card X n 
(y+[0,l] rf ) <oo. 

A sufficient condition is the following: In dimension d — 1, if D~(X) > 1 and 
infj^fc \xj — Xk\ > 0, then X is a set of sampling. 

Proof, (i) is the result of Landau [19] and have been re-derived in [16] for discrete 
sampling sets; the general case is an easy extension. 

(ii) is an easy consequence of the finiteness of upper bound B in (TjQ). 

The sufficient condition in dimension d — 1 is usually attributed to Beurling and 
treated in detail by Seip [29]. □ 

Loosely speaking, a set of stable sampling must be dense enough and cannot 
have arbitrarily large "holes". 

We now consider random sampling sets. Let our probability space be (Q, J 7 , P) 
and denote points in Q by u. When sampling a function / randomly, we consider 
its samples f(xj) on a sequence of random points Xj = Xj(ui). Clearly, a sequence 
of random points need not have the sufficient density stated in Proposition 12.11 
However, if the process is designed to yield only random sets with D~(X) > 1, one 
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could hope that a generic random set with the necessary density would be a set 
of stable sampling. This intuition is completely false, as we will show in the next 
sections. 

2.1. Uniform distribution on large disjoint cubes. There are various ways 
in which one could choose points randomly in M. d . As a first model we partition 
R d into disjoint cubes k + [0, l] d , k G Z d , and, in each cube, we choose r points 
independently and uniformly distributed over k + [0, l] d . Let X be the collection 
of sample points; X is a random set and thus depends on u. Clearly D~(X) = r 
almost surely, so one may expect that X(u) is a set of stable sampling with high 
probability. 

Our first result says that one cannot obtain a sampling inequality. 

Proposition 2.2. Let r > 1 be the number of random samples in each cube k + 
[0, l] d . With probability one the following holds: 
For each k > there is a function ff. G B such that 



£ l/*(**)l 2 <rllM 



The function fa will necessarily depend on u. 

Consequently, a sampling inequality of the form (TTj) is violated almost surely. 

Proof. For notational simplicity we give the proof only in dimension d — 1; the 
case of several variables is treated similarly. 
Let 

, , sin(7ra/2) 

5W = 77T- ' 

7TX/2 

and let if> be a nonnegative C°° function with support in [—1/4,1/4] such that 
if> = 1 on [—1/8,1/8]. Let \& be the inverse Fourier transform of ip and define 
F(x) = g(x)^(x). 

Since ip and thus \I/ are in the Schwartz class, F is in L 2 , decays rapidly, and 
there exists a constant c\ such that \F(x) \ < c\/(l + \x\ 2 ). The Fourier transform of 
F is g* if), so the support of F lies in [—1/2, 1/2], i.e., F G B. Since F is bounded, 
by Bernstein's inequality, F' is also bounded, say, by c 2 := IIF'Hoq. 

Choose iV a large even integer so that 

(3) V ^ < 

^ (1 + (lil - l) 2 ) 2 4A; ' 

Choose 5 > small so that 

(4) 2c 2 Nr5 < HI 

Let Aj be the event that in the interval [j,j + 1] all r points that were chosen 
randomly lie within + 5) if j is even and within (j + 1 — 8, j + 1) if j is odd. 
The events Aj are independent and the probability of Aj is 5 r . 
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Let B = n^L-jv A? ^ e the even t that the samples in [-N, N] are in a 8- 
neighborhood of the even integers. By independence, the probability of B is 
(S r ) 2N+1 . If uj G B, then using and our bound on F, 



E i^)i 2 ^ E 7T 



c 2 r ||F|| 2 



(1 + (h'l - l) 2 ) 2 4k ' 

XieX(u)\[-N,N] \j\>N/2 K KlJl 11 

By construction, F(2j) = for j G Z, and so using the bound on F', we have 
< C2<5 if |x — 2j| < 5. Therefore if u G B, then 



E l^)! 2 < 2 c 2 Nr8 < 
Xiex(u)n[-N,N] 

Combining, if u G B, then 
(5) E \F(x^< 1 ^. 



\F\\ 2 
2k 



Now let C m = n^r^/v-Ar Aj. Clearly the probability of C m is the same as the 
probability of B. So J2m=i ^(Cm) = oo. By independence and the Borel-Cantelli 
lemma, with probability one, C m occurs for infinitely many m. If u G C m , let 
fk( x ) — F( x ~ 3mN). Clearly, fk^B and the same bounds c\ and c 2 hold for f k 
as for F, provided translation is taken into account. As in (|3J), 



E lAfrOf < 



2 ^ ll /fclll 



Thus we have proved that, with probability 1, X fails to be a set of stable sampling. 

□ 

2.2. Spatial Poisson processes. Another scheme of choosing points randomly in 
M. d is the spatial Poisson process X. This means that for some (intensity) function 
A : lR d — ► [0, oo), for any Borel subset of lR d , the number of points in X H A is a 
Poisson random variable with parameter j A X(x) dx. If A\, . . . , A n are disjoint sets, 
then the number of points in X D Ai are independent random variables. 

The most natural case is where X(x) is a constant, X(x) = p, say. Then the 
expected Beurling density of X is p. Again one might think that X is a set of 
stable sampling with high probability. However, as in Proposition 12.21 one cannot 
get the sampling inequality. In fact, a stronger result is true. One can choose points 
at a higher rate further from the origin and still have the sampling inequality failing. 

Proposition 2.3. Suppose X is a spatial Poisson process with X(x) = o(l + 
log + (|x|)). Then with probability one, the sampling inequality (pQ) fails for every 
subset y of X . 

Proof. Under the hypothesis on A, the Beurling density of X may be infinite. In 
this case, X contains too many samples and the upper bound in the sampling 
inequality ([!]) will fail to hold. This problem could be fixed by extracting a subset 
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y of X that satisfies the necessary conditions of Proposition 12.11 So one may still 
hope that a subsequence y may yield a set of stable sampling. 

However, we will show that with probability one, for each k > there exists a 
cube of side length k that contains no point of X. Since the maximal hole of a 
set of stable sampling is bounded by Proposition 12. l( iii). with probability one, no 
subset of X can be a set of stable sampling. 

The probability that a Poisson random variable with parameter A is equal to 
zero is e~ A . If Xj, i — 1, . . . , n, are independent Poisson variables with parameters 
Aj, resp., then 

P(at least one X { is zero) = 1 - F(X 1 ^ 0, . . . , X n 0) 

n n 

= i-n p ( x ^°) =i -ri( i - e " 



-A, 



i=l i=l 



n 

= 1 — exp ^ log(l — e 

i=l 

n 

> l-exp(-^ 



e 

i=l 



Let e > be chosen later. Choose mo > k large so that X(x) < elog(|a;|) if 
| a; | > m . For each m > m we can find at least m disjoint cubes of side length 
k lying in B(0, 3mk) \ B(0, 2mk); call them C m i, . . . , C mm . The number of points 
in X lying in any one of the C m j is a Poisson random variable with parameter less 



empty is greater than 



than cie(\ogm)k . So by the above, the probability that at least one of the C m j is 



1 - exp (-me- Cie(logm)fcd ). 
If we choose e so that C\ek d < 1/2, then the above probability is greater than 

1 — exp ^ — m 1//2 j , 

which will be greater than 1/2 if m is large enough. 

Let D m be the event that at least one of the cubes C m j, j — 1, . . . , m, is empty. 
For m large, F(D m ) > 1/2, and the D m are independent. So by the Borel-Cantelli 
lemma the event D m happens for infinitely many m, with probability 1. In par- 
ticular, there must be at least one cube of side length k with no points of X in 
it. □ 

On the other hand, the rate of growth log + (|x|) is critical. If the intensity 
function A grows faster than some multiple of log + (|a;|), then the random sequence 
X cannot have large holes. 

Proposition 2.4. Suppose X is a spatial Poisson process with intensity X(x) > 
co(l + log + (|x|)) for allx. Fix a > 0. If Co > (d+l)/a d , then with probability one, 
there exists R > 0, such that every cube ak + [0, a] d for a\k\ > R contains at least 
one point of X . 
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Proof. Let § Q be the collection of all cubes of the form ak + [0, a] d , k 6 Z d . We 
will show that with probability one, all but finitely many cubes in § Q contain at 
least one point of X. 

Let Ck be the event that the cube A = ak + [0, a] d contains no point of X. If 
a\k\> N, then X(x) > c \ogN, and thus X(A) > c a d log N. Thus for a\k\ > N, 
the probability that this cube is empty is 

P(Cjfe) = e- x(A) < e~ CoadlogN = N- Coad . 

If we choose coa d > d + 1, then ^2 keZ d P(C'/ C ) < oo. Then by the Borel-Cantelli 
lemma, the probability that infinitely many of the cubes are empty is 0. Therefore 
from some R on (depending on u), all cubes in S a that are at least R from the 
origin are nonempty. □ 



3. A Positive Result: Relevant Sampling 

The key to the arguments in Section [2] was that random sampling sets have 
either arbitrarily large holes or can be concentrated near the zeros of a bandlimited 
function. In the former case we then constructed a class of functions whose main 
energy is concentrated on the "hole"; in the latter case we constructed a class of 
functions with prescribed zeros. These classes then violate the sampling inequality. 

To obtain positive results we change the focus. Since for no reasonable random 
sampling set does the norm equivalence ([1]) hold with positive probability for all 
bandlimited functions, we will restrict the class of functions for which we ask ([1]) 
to hold. The natural idea is to sample a given / in the region where a significant 
part of the energy is located. In other words, we sample in the region of relevant 
values. 

This idea motivates the following definition. Let Cr = [-R/2, R/2] d be the cube 
of length R centered at the origin. Its volume is vol Cr = R d . 

Definition 1. Fix a large number R > and a small 5 G (0, 1). Set 

(6) B{R,S) = {feB:J c \f{x)\ 2 dx> (1-5)11/111} 
and 

(7) B{R,8) = j/e£: ||/||1 = 1 and jT |/(x)| 2 dx > 1 - ^} 

Then B(R, 5) is the subset of B consisting of those bandlimited functions whose 
energy is largely concentrated on the cube Cr. Only a fraction 5 of the total 
energy is outside this cube. We note that B(R, 5) may be empty when 5 is chosen 
too small. (For an estimate of 5 such that B(R,5) ^ 0, see Section 3.1) In the 
following we assume that B(R, 5) is non-empty. 

It now makes sense to sample such / on the cube Cr and to expect that these 
samples are relevant and capture the main features of /. 

Indeed, we will prove the following result. 
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Theorem 3.1. Assume that {xj : j G N} is a sequence of i.i.d. random variables 
that are uniformly distributed over the cube Cr = [-R/2, R/2] d and < < 1 — 5. 
Then there exist A, B > such that the sampling inequality 

r 

(8) ^(1 - S - dm* < £ \f(xj)\ 2 < ^(1 + V/ G B(R, 5) 

holds with probability at least 

2 

, — P r H 

1 - 2Ae R d 41 +^ . 

The constant B can be taken to be B = ^P. For large R and sufficiently large 
sampling size r the constant A can be chosen of order A = exp(CR d ) with C 
depending on the dimension d. 

3.1. Discussion and Open Problems. 1. We emphasize that the exponential 
probability inequality holds uniformly for all / G B(R, 5). By contrast, for fixed / 
such an inequality could be derived much more simply from standard limit theo- 
rems. 

2. Theorem 13.11 is an asymptotic result. It is effective only for sufficiently 
large sampling sizes. To achieve (|SJ) with a probability exceeding 1 — e, we need 

2Ae~ B ^^ < e or 



(9) r> Rd ^ + /\ \o g \ + CR d )=0{R™) 



R d (Al + (i) 
~B^ 

Since B(R, 5) sits in a space of approximate dimension D = R d , we need 0{D 2 ) 
samples to recover every / G B(R, 5). In finite dimensional problems, for instance, 
when sampling trigonometric polynomials of fixed degree, one can often use random 
matrix techniques to show that the effective number of samples is in fact of the 
order O(DlogD) [15]. It is open whether this bound is achievable for bandlimited 
functions in B(R, 5). 

3. The sampling inequality (jSJ) states that every / G B(R, 5) is uniquely deter- 
mined by a sufficient, but finite number of samples in [— R/2, R/2] d . This may 
seem paradoxical at first glance, because the set of bandlimited functions such that 
f(xj) = for j = 1, . . . , r, is an infinite-dimensional subspace of B. However, as we 
assume that / is essentially supported on the cube [—R/2, R/2] d , this means that / 
must take large values there. If f(xj) = for sufficiently many Xj G [—R/2, R/2] d , 
then / would oscillate and thus have a large derivative. But this would contra- 
dict the bandlimitedness, which implies that the derivatives of / are bounded by 
7r. While Theorem 13.11 is a probabilistic result, it seems possible to also prove a 
deterministic sampling inequality (jSJ) for B(R,5), at least in dimension d — 1. 

4. We emphasize that B(R, 5) is not a subspace. This means that the frame 
algorithm (a linear reconstruction method) [13] cannot be used to recover / from 
its samples. Likewise, the project ion-onto-convex-sets (POCS) method cannot be 
applied, because B(R, 5) is not convex. Although (jSJ) determines each / G B(R, 5) 
uniquely, currently we do not have an explicit reconstruction algorithm to recover 
/ from its relevant samples. 
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4. Proof of Theorem 13.11 

Theorem 13.11 will be a consequence of a large deviation inequality that holds 
uniformly over the whole class B(R, 5) and will be proved in the following sections. 

4.1. Time-Frequency Limiting Operators. Let Pr and Q be the projection 
operators defined by 

(10) Ptif = Xc R f and Qf = -F _1 (X[-i/2,i/2] d /) > 

where T~ x is the inverse Fourier transform. Then Q is the orthogonal projection 
from L 2 (M d ) onto B and Pr is the restriction of a function to the cube Cr. The 
composition 

(11) A R = QP R Q 

is the operator of time and frequency limiting. This operator has been studied 
in detail by Landau, Slepian, Pollak [20,21,31-33] and many others. It encodes 
many deep properties of bandlimited functions and their restrictions. In particular, 
Ar is a compact positive operator of trace class and a precisely known eigenvalue 
distribution. 

We summarize the properties of the spectrum that will be needed in the sequel. 
Let A^ denote the operator of time-frequency limiting in dimension d — 1. 
Explicitly, A^ is defined on L 2 (IR) by the formula 

(4'/n« = r 5i "*f '"' /fa) dn for |f | < 1/2. 

We denote its eigenvalues by /!& = fik(R) in decreasing order and indicate the 
dependence on R. Then the first [R] eigenvalues are approximately 1, followed 
by a "plunge region" of thickness 0(logR) after which the remaining eigenvalues 
are almost zero. Precisely, /i[R] + i(i?) < 1/2 < /i[^]_i(-R); see [18]. This behavior 
of the eigenvalues is usually formulated by saying that functions with spectrum 
[—1/2, 1/2] and "essential" support on [-R/2, R/2] form a finite-dimensional sub- 
space of "approximate" dimension R. In particular, we may think of B(R, 5) as a 
subset of a finite-dimensional space of dimension R. 

The precise asymptotic behavior of the for k — > oo was obtained by Widom [35, 
Lemmas 1-3]: he showed that for large k 



(12) w.0R)x2tt( 



TtRyk+i i 
If) kP 



where x 6 fc means that lim^oo au/bk = 1- In particular, (fT2]) implies the super- 
exponential decay 

/ Ik \ 

(13) fi k (R) < Cexp ( - 2Hog (-)) . 

We will use the following weaker exponential estimate. 
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Lemma 4.1. [35] Given a > there exists a constant k > 0, such that 

(14) M#)<e- fc/K fork>-^—. 

1 — a 

REMARK: This result is an asymptotic result for both R — > oo and k — > oo. We 
emphasize that the constant k depends only on a, but not on R. (Widom works 
with the operator J_ t B ™Zff / (??) c^7, so a simple dilation shows that we have to 

use 7 = 7ri?/2 to obtain A^ .) 

The largest eigenvalue /io of Ar is the operator norm of A^ and is of size 
fjLo(R) = 1 - 27rV2Re- 7rR (l + C^fT 1 )) by a result of Fuchs [14]. Thus up to terms 
of higher order the operator norm of Ar ; is Ao = /ig — 1 ~~ 2-rcd^/2Re~ 7TR . Assume 
that B(R, 5) ^ and / G 5), then 

l-5< / |/(t)| 2 dt=<^/,/>< V 
Jc R 



This implies that 5 > 1 — Ao > 2itd\/2Re wR (up to terms of higher order in R). 

Next, let (7(e) be the function counting the number of eigenvalues of A^ ex- 
ceeding e, precisely 

(15) (7(e) = card{/i fc : > e} . 
Then Lemma 14.11 implies that 

(16) C(e) < — — + «log-. 

1 — a e 

A different estimate for the eigenvalue count was obtained by Landau and Widom [22]: 

(17) (7(e) = R+ -log^— ^ logi? + of logi? 

However, this is an asymptotic result for R — > oo, and its proof leaves open whether 
the term o( log i?) can be chosen independent of e. By contrast, the weaker estimate 
ffTB"]) works with a constant k independent of R, at the price of the factor (1 — a) -1 . 
Since we need the eigenvalue behavior for fixed R, we use Widom's earlier result. 
Next consider the time-frequency limiting operator Ar on L 2 (M. d ) Clearly Ar is 

the (i-fold tensor product of Ar , A R = A^ <E> • • • <E> A^. Consequently, A is an 
eigenvalue of Ar, A G ct(Ar) , if and only if A = Y\^ =1 fikj> where jj, k . G a(A R v> ) 
is an eigenvalue of the one-dimensional operator A^ . Since < /i^ < 1, we have 
llj=i t^kj > e on ly when > e for j = 1, . . . , d. Consequently, 

d 

(18) {A G tr(Afl) : A > e} C {A = JJ^ : fM kj G a(A^),fi k > e} . 

i=i 

We arrange the eigenvalues of A R by magnitude 1 > Ai > A 2 > A 3 • ■ • > A n > 
A n +i > • • • > and again let (7(e) = max{n : A n > e} be the function counting the 
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number of eigenvalues of Ar exceeding e. We choose a = 1/2 and combine (fl~6l) 
and (|18p ; then the eigenvalue distribution for in dimension <i is 

(19) C(e) < (2ii: + Klog-) d , 

where k is independent of R and e. 

4.2. Covering Number for B(R,5). Recall that the covering numbers N(e) = 
N(C, e) of a compact set C in a Banach space are defined to be the minimum 
number of balls of radius less than or equal to e required to cover C. For the 
covering number of balls in Euclidean space we use a well-known estimate, see [8, 
p. 9] and [11, Prop. 5]. 

Lemma 4.2. Let D(0,r) = {x G C d : \\x\\2 < r} be the ball of radius r in C d . The 
covering number of D(0,r) is given by 



(20) N(e) = e 



2dloi 



4r 



Let us note that the covering number of the shell D(0,r) \ D(Q, r(l — 5)) for 

some 5 > is of the order N(e) = e Mlog ^ - e Mlog ^ = e 2dl ° s ^ (l - 

The difference from the covering number of the full ball is thus negligible for large 

dimensions. 

In the main part of the argument we will use the restriction of bandlimited 
functions to the cube Cr. Therefore we will use the local norms 

1/2 



2./,' = ( / |/(a;)| 2 Gb 
c R 

oo ijR = SUp |/(x)| , 

x^C R 

and we denote the restriction of B(R, S) to Cr by 

V{R, S) = P R B(R, 5) = {fe L 2 (C R ) : / = X c R h for h e B(R, 5)} . 

Lemma 4.3. (i) V(R,8) is a compact subset in L 2 {Cr). 

(ii) The covering number A^e) ofV(R,5) (with respect to \\ ■ \\2,r) is bounded by 

(21) N 2 (e) < exp (2 d+1 (R + ^log ^) d log ^) . 

Proof. The finiteness of the covering numbers implies that V(R, 8) is compact, so 
it suffices to prove (ii). 

(ii) Let ip n be the normalized eigenfunctions of Ar corresponding to the eigenval- 
ues of A n . (These are tensor products of the standard prolate spheroidal functions.) 
Then {ip n : n G N} is an orthonormal basis for B. If / = XlneN Cnfn G B, then 

I2 = SneP 



(22) \\f\\l,R= [ \f(x)\ 2 dx=(A R f,f) = J2\Cn\ 2 \n- 

Jc R n&5 
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Consequently / G B(R, 5) if and only if c G Ss = {c G I 2 : ||c|| 2 = 1, X^Li l c n| 2 A n > 
1 — 5}. Then V(-R, 5) (with the local || • || 2i #-norm) and Ss (with the weighted £ 2 - 
norm) are isomorphic and their covering numbers are identical. 

We first determine a suitable cutoff D such that the remainder J2 n >D \ c n\ 2 X n is 
uniformly small over Ss'- since A n < 1, we have 



n>D n<D 



1 — 2_j \ Cn \ 2 — |c n | 2 A„ 

n<D 

> 1-6-22 \ C n?^n 
n>D 

> l-5-\ D+1 J2\ c 



n>D 



We obtain that 



and thus 



S > (1 - X D+1 ) 



- I 2 

-n I ; 



n>D 



(23) V |c n | 2 A n < \ D+1 V |c n | 2 < - A ^ +1 5 . 

n>D n>D U+L 

2 \ 

Given e > 0, we choose the minimal D so that A^+i < |j, then < 2Ad + i5 < 

e 2 /2. According to ( fl9l) we may choose D to be 

(24) D = C {^) ^ (2R + ^og^y = 2 d [R + K\og 2 -^-) d . 

Once D is determined, choose an ^=-net {a.j : j = 1, . . . , N 2 } for the unit ball 
in C D (with respect to the Euclidean norm) and set fj = 'Yl in< £i&j{n)(p n . By 

Lemma [4.21 the cardinality of this net is at most iV 2 = e 2Dlog± ir\ 

Given / = Y^=i c nfn £ B(R, 8), choose a^ and the corresponding fj G £>, such 
that J2 n <D \°n ~ a j( n )\ 2 < e V 2 - Then by (J23H and the definition of D 

\\f-fj\\i,R = ^2\c n -aj(n)\ 2 X n +^2\c n \ 2 X n 

n<D n>D 

e 2 e 2 2 

< 1 = e 2 . 

~ 2 2 

Thus {fj} is an e-net for V(R,5) with respect to || • || 2) #. Now by Lemma H~2l and 
(1241) imply that the cardinality of this e-net is at most 



N 2 (e) < exp(2 J Dlog^) <exp(2c(^)log^) 
(25) < exp(2 d+1 ( J R+/ t log^) <i log^). 



□ 
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REMARK: In dimension d — 1 a similar estimate for the covering number was 
obtained in [17]. Estimate ff25l) also follows from general principles in approxi- 
mation theory [26] (Ch. 4.2, in particular Thm. 2.5 and Cor. 2.6). The estimate 
of the covering number by means of the eigenvalue distribution, equivalently be- 
tween entropy numbers and approximation numbers, goes back to an inequality of 
Mityagin [24, Ch. 9]. 

As our next step we want a similar estimate for the covering number of V(R, 5) in 
the local || • Hoo^-norm. For this recall a basic inequality for bandlimited functions: 
IffEB, then 



(26) 



< 



V/eB. 



A similar comparison for the local norms is given in the next lemma. 
Lemma 4.4. If f E B, then 



(27) 



,r = m ax |/(a;) | < K d 

x&Cr 



where the constant Kd depends only on the dimension d and is of order 0(d). 

Proof. We assume first that / is real- valued and that a = max^g^ |/(#)| is taken 
at xq E Cr. By switching sign if necessary, we have a = f(xo) > \f(x)\ for all 
x E Cr. Next observe that by (126]) and Bernstein's inequality we have 



df 



dxj 



< 



df 



dxj 



< 7T 



for / E B , 



and consequently 



|V/| 



df 



,. x - max I y~] \ tt—{x) 
x€R d \ ^ I Q x . 

3=1 J 



2 \ 1/2 



<Wd\\fh 



for / E B . 



Since f(x) = f(xo) + Vf(£) ■ (x — x ) for some £ E M d , we obtain a lower estimate 
for / near its maximum at x$ by 

\f{x)\ > a- || |V/| Hoc \x - x \ > a - 7rVd||/|| 2 \x - x \ > 



on the ball B(x ,f3) = {x : \x — x \ < «/(7rv^||/|| 2 ) := f3}. We note that f3 = 
||/||oo,fl/ (tt V0II/II2) < {it\fd)~ 1 by ( 1261) . and thus a fixed portion of the ball B(xq, (3) 
is always contained in Cr. 



RANDOM SAMPLING OF BANDLIMITED FUNCTIONS 15 

Consequently (with <Jd-i denoting the surface of the d — 1-dimensional unit ball 



in 



i)d\ 



\f(x)\ 2 dx > / (a — 7rVa||/ H2 \x — xq\) 2 dx 

< Jc R nB(x ,i3) 

> I (a - irVd\\f\\ 2 \x - x \) 2 dx 

Z JB(x ,P) 



1 

2d 



a — nVd\\f\\ 2 \x\) 2 dx 

■p 



±*±.idn 2 \\f\\l J ((3 - rfr d - 1 dr 
g^ll/lB d(d+1 ) (d + 2) /^ 



Unraveling this inequality, we obtain that 
00, r = max I /(x) I = nVd\\f\\ 2 (3 

x&C R 



< (2 d -V d i 1 (rf+l)(rf + 2))^(vrv / rf) d " 2 ||/||r 2 (/ \f{x)\ 2 dx) 



d+2 



= ^ii/iini/iis- 

For complex- valued / G B, we have to take K d = 2K' d = 2(2 d ~ l a d \(d + l)(d + 

2))^(7Tv / rf) 5 ^. Using cr rf _i = d-n d/2 /Y(d/2 + 1), one can then show that K d = 
0{d). □ 

Corollary 4.5. (%) V(R,8) is a compact subset in C([—R/2,R/2] d ). 

(ii) The covering number N(e) ofV(R,5) with respect to || • ||oo,_r is bounded by 

(28) N(e) < exp (2 d+1 (R + «(^ + 1) log ^) d log ^ 

Proof. Given e > 0, set e = 2~ d l 2 (^-)^ 2+1 and let {/_,-} by an e -net with respect 
to || ■ \\ 2> r. If / G V^(i?, 5) and ||/ — /j||2,i? < £o> then we have 

11/ - /ilk* < ^11/ - fi\\t 2 11/ - /il^ < K d 2& ef"<e. 

Thus {/,} is an e-net for V(R,S) with respect to || • \\oo,r an d N(e) < N 2 (e ). 
Now use Lemma 14.31 and estimate the occurring logarithmic term by log 2 ^- = 

log < (1 + 1) kg ^f*. □ 

The precise order of the covering number for d — 1 with respect to the local 
supremum norm || • ||oo,j? was derived by Buslaev and Vitushkin [5]. Their technique 
is specifically one-dimensional and yields N(e) = e Rlos ^ c ' e ' for some constant C > 0. 
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We will work with e-nets in the || ■ Hoo^-norm for e = 2~ e , £ = 1,2,.... In this 
case the covering number can be rewritten as 
(29) 

N(2- e ) < exp (2 d+1 ( J R+(^+l) K ((£+l)log2+log^)) d ((£+2)log2+logK,)) := exp 

where p(£) = 2 d+1 (r + (f + 1)« ((£ + 1) log 2 + log K d ) Y ((£ + 2) log 2 + log K d ) is 

a polynomial of degree d + 1 . 

What is crucial in the above estimate, is that the exponent grows polynomially 
in £, but not faster. 

4.3. Preparation for the proof of Theorem 13.11 Assume that {xj : j 6 N} is 
an infinite sequence of i.i.d. random variables, each of which is uniformly distributed 
over the cube Cr. 

For every / G B we introduce the random variable 

(30) Yj(f) = |/(x,)| 2 - ^ J c |/(x)| 2 dx = \f(x 3 )\ 2 - E [ \f( Xj )\ 2 ] . 

Then Yj(f) is a sequence of independent random variables with EYj(f) = 0. 
We first estimate the probability distribution of the random variable 

r 

sup 

feB(R,S) j=1 

For the repeated application of Bernstein's inequality for sums of independent 
random variables we will need the following estimates for the lj(/)'s. 

Lemma 4.6. Let f,g E B(R,S) and j G N. Then the following inequalities hold: 

(31) VarY}(/)<-l, 

(32) Var(F,(/)-F,(^))<-l|| / _^||^ ) 

(33) ||iS(/)lloo<l, 

(34) \\Y j (f)-Y j (g)\\ 00 <2\\f-g\\ 00;R . 

Proof. We abbreviate the expected value of \f(xj)\ 2 by m(f) = R~ d J c \f(x)\ 2 dx. 
Using (|26|) . we obtain 

VexYjif) = E[Y 1 (p) 2 ]=E[\f(x 3 )\']-m(f) 2 



j~ d j c \f{x)\Ux-mU? 



< ^WfWU 11/111 <^ 



Similarly, we obtain 
||y i (/)|| 00 = sup \f(x 3 (uj))\ 2 -m(f) < max (ll/H 2 ^, / \f(x)\ 2 dx)<l. 
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To prove fl32l) . we write 

Var^OO-Y^)) = E(Yj(f) — Yj(g)) 2 

1 

R d 
1 



^ I (\f(x)\ 2 -\g(x)\ 2 Ydx-(m(f)~m(g)Y 

c R 



< 



R d 



Cr 



\f(x)-g(x)\ 2 (\f(x)\ + \g(x)\) 2 dx 



< ^11/ " 9\\l, R jj\f(x)\ 2 + \g(x)\ 2 ) dx<^\\f- g\\l, R 
The last estimate follows similarly from 



IW)-^(<?)l|oc < SUp \f{ Xj {L0))\-\9^M 

< nm 2 -M 2 ik* 

< 11/ -siloed II l/l + MIL 
= 2H/-0IU,*. 



1 



(\f(x)\ 2 -\g(x)\ 2 )dx 



Cr 



□ 



4.4. Proof of the sampling inequality. The sampling inequality follows from a 
uniform large deviation inequality for the sampling of bandlimited functions. 

Theorem 4.7. Let {xj : j G N} is a sequence of i.i.d. random variables that are 
uniformly distributed overCn = [-R/2, R/2] d . Then there exist constants A,B>0 
depending on d and R, such that 



(35) 



■( 



P sup 



> A < 2,4 exp 



B 



X 2 



41rR~ d + A 



for r G N and A > 0. 

Here B — ^P. If R is sufficiently large, A is of order A = exp(CR d ) for a 
constant depending only on d and k. 

Before we prove the large deviation inequality, we show how the main theorem 
follows from Theorem 14.71 

Proof of Theorem\3J\ Choose A = ^ and recall that Yj(f) = \ f( Xj )\ 2 -R~ d J Cr \f(x)\ 2 dx. 
Thus the event £ = {snpj- eB ^ RS ^ \ Y?j=i ^i(/)l — r ^R d } coincides with the event 
(36) 
r 



f c \f^)\ 2 dx- r £ d <^2\f(x,)\ 2 <^- d j c \f(x)\ 2 dx+ r £ d for all / G B(R, 6) 



Since by definition 1 — 5 < J Cr \ f (x) \ 2 dx < 1 , we find that the event of the uniform 
sampling inequality 

(37) r(1 "^"^ <El/(^)| 2 < ! %^ for all / G B(R, 5) 



R d 



R d 
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is contained in S. As a consequence of Theorem 14.71 the sampling inequality ([3] 
holds uniformly for all / G B(R, 5) with probability at least 

1 - 2Aexp(-Sri2"V/(41 + A*))- 
This proves Theorem 13.11 □ 

We are left to prove the probability estimate of Theorem 14.71 To estimate 
the probability of the deviation of a sum of random variables from its average 
we use Bernstein's inequality for the sums of independent random variables [2]: 



Let Yj,j = 1, 



,r, be a sequence of bounded, independent random variables with 



EYj = 0, VarY} < a 2 , and WYjW^ < M for j = 1, 



(38) 



r 

P (|5Z^'| - A ) - 2eXP ( 



, r. Then 
A 2 



2ra 2 + f MA 



Proof of Theorem ^. 1\ Step 1: A metric entropy argument. For a given £ G N, we 
construct an 2~ £ -covering for V(-R, 5) with respect to the local norm || • ||oo,_r- Let 
At be the corresponding 2~^-net for £ = 1,2,.... Then At has cardinality at most 
N(2~ e ) < e p W for some polynomial of degree d + 1 by Corollary 14.51 

Given / G £>(i?, 5), let be the function in A(2~ e ) that is closest to / in || ■ \\oo,r- 
norm, with some convention for breaking ties. Since ||/ — fe\\oo,n. ^ we can write 

Yj(f) = Yj(h) + (Yjif,) - Y 3 {h)) + (Y 3 (f 3 ) - Yj(f 2 )) + ■■■. 

If supygg^) | Y7j=i Yj{f)\ > A, then St must hold for some £ > 1, where 

r 

St = jthere exists f\ G .4(1/2) such that | ^IjC/OI - V 2 } 



and 



6 



with 



there exist ft G -4(2"^), and £_i G *4(2~ m 
II — /«-i||do,b < 3 • 2 e , 

r 

such that | {YjU'e) - | > A / 2 ^ 2 } • 



If this were not the case, then, with f Q = 0, 



< 



oo 

E 



< 



A 



7T 

12 



Next we estimate the probability of St- 

Step 2. We estimate the term £ = 1 separately. For fixed / G .4.(1/2), the proba- 
bility of the event Si is bounded, using Bernstein's inequality (1551) and Lemma |4T^| 
by ' 



A 2 /4 



A 2 
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There are at most N(l/2) = exp (2 d+1 (R + «(| + 1) \ogAK d ) d log 8^ functions 
in .4.(1/2), so the probability of E\ is bounded by 



A 2 



2rR~ 



A/3 



(39) 2exp (V +1 (i? + + 1) log4^) d log8^) exp ( - 

Step 3. For £ > 2, we estimate the probability of En in a similar fashion by using 
Lemma WM and (IMj). If / e ^(2 _< ) and # G .4(2~ m ) with ||/ - g\^ R < 
3 • 2~ e , we have 



F ( Eow) - no?)) 

^ 7 = 1 



> 



A 

2£2 



< 2 exp 



2 exp 



A 2 /4£ 4 



2r • 4 • R- d (3 ■ 2-t) 2 + §2 • 3 • 2~ i - 1 \/£ 2 
2 e A 2 



8£ 2 36^-^2-^ + A, 

Note 36£ 2 /2^ < 41. There are at most N(2~ l ) functions in A(2~ e ) and iV(2~ m ) 
functions in 4.(2 _f+1 ). Finally, this can happen for any £. So the probability of 
Ufc=2 ^£ * s bounded by 



(40) 



£ AT(2->(2-* +1 )2ex P (- — 

00 / 2 £ 

< ^2exp(p(£)+p(£-l)- — 



A 2 



41ri2- d + A 



A 2 

AlrR~ d + A 



where we use ( 1291) for the covering number. 
S^ep 4. We will need the following inequality: 



If p,a > 0, then 
(41) 



oo 1 

T e ' alp < -4 



- ap 



1=2 



pa log a 



This inequality follows from the integral test and the substitution a x = u: 



x /*oo 

J2e~ alp < / e~ aXp dx 

1=2 ^ 1 



i r _ m ,du 
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Step 5. To estimate the sum (|40p . we rewrite and simplify each term. Set 

A 2 



(42) V 



41r,R- d + A 

2 ^/2 



(43) ci = min 

^>2 8£ 2 

44 c 2 = max^/. 

and sup£ >2 £ 2 /2 £ = 9/8. Then the £-th term in (j4"0l is majorized by 

exp(-2^ 2 (c^-c 2 )). 
If -0 > is large enough so that p := C\ijj — c 2 > 0, then (jUJ implies that 

oo _ 

P(l J&) < 2 e -V2( Cl ^-c 2 ) 

2V2 / v^A 2 

(45) = - — - — : exp 



\og2dip-c 2 V 4lrR~ d + \, 
Since the term for t = 1 has the same form, we have proved that 



( 



P sup 



> A < 2 A exp 



V^ciA 



2 



41ri?- rf + A 



whenever > c 2 /ci. 

For the exponent B we may take the smaller of the exponents in ( 1391) and ()45l) . 
i.e., 5 = min(3, \/2c\). If we choose A large enough, so that c\ij} — c 2 > then 

we may take A = max(exp \2 d+1 (R + k(| + 1) log4ir d ) d log8fG), e^ 2 ). Thus we 

have proved Theorem 14.71 

Step 6. To obtain an idea of the magnitude of the constants involved, we give 
some rough estimates for c\ and c 2 , A and B. 

For ci we obtain 

1 2^ 2 1 
Ci = - mm — — = — , 

8 ^>2 £ 2 36 

so the exponent B in fl35|) is \[2c\ = which is approximately ~ 0.0393. 

As for c 2 , recall that = 2 d+1 ^R + (f + 1)«((^ + l)log2 + logK d )^ ((£ + 
2) log 2 + \ogK d ). If (| + 1)k((£ + 1) log 2 + \ogK d < R, then 

2^ < 2 (2#) max ^ < c 3 ^ . 

In the other case, we may estimate p(£)/2 i ^ 2 by a constant that depends on d, 
Kd and k, but not on R. Thus for R sufficiently large, we obtain c 2 < c^R d and 
A < exp(CR d ). 
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Finally consider the condition c\ip — C2 > which follows from 41r ^-d +x > 
c A R d > C2+2 ^ /log2 . Since x > B + VI) implies a; 2 > fix + D, we find that 

A > c 4 + (41c 4 rir d ) 1/2 , 
for a constant independent of R. □ 
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